-
Notifications
You must be signed in to change notification settings - Fork 362
Conversation
Looks promising, I'll leave it to Lukas to check the specifics. Is the model definition/file format beginning to stabilise a little? Part of the reason it's still experimental is because, as far as I can tell, upstream is still figuring out the specifics of the implementation. |
@philpax Yeah, perhaps it should remain in an experimental state until it undergoes all necessary tests and stabilizes. |
@skirodev On a first glance this looks good. How did you create your ggmlv3 versions of the model? I would like to try 7B- and 40B-instruct before i merge this. |
@LLukas22 I just uploaded the conversion code (only for f16 and f32) to Hugging Face. The quantization code is not completed yet. |
Thanks! You an probably just use |
Thanks for the suggestion! I'll try using llm to quantize the converted models and see how it works. |
Alright, I used your script to create ggml versions of falcon-7B and falcon-40B and quantized them with 7B works as expected. Output:
Sadly 40B produces gibberish 😞. Output:
I don't know if the inference code is the problem or if the conversion-script/quantization corrupted some tensors. |
My apologies, could you retest it now? |
It got a bit better but there probably still is something wrong. Maybe we should wait for the ggml update and revisit it then? |
Absolutely. I agree with your suggestion. Let's await the ggml update and take another look at it when the time comes. Thank you for conducting the testing. |
Could you try to pull the latest main into this? It should contain the latest ggml version. |
Sure, I already pulled the latest main branch for the updated ggml version. |
Falcon 40B still produces gibberish:
Maybe i need to reconvert/quantize my model 🤔 |
Falcon 40B is now capable of successful inference after my testing. cargo run --release -- infer -a falcon -m "./models/falcon-40b-instruct-ggmlv3-q4_0.bin" --batch-size 512 -p "write a story about falcon" -r tiiuae/falcon-40b --stats
The majestic bird of prey soared through the sky, its wingspan stretching outwards as it searched for prey. Its sharp
eyes scanned the horizon, and in an instant, it spotted movement below. With powerful strokes of its wings, it dove
towards its target at incredible speeds before striking with lightning-fast precision. The falcon was a symbol of
strength, agility, and intelligence – an awe-inspiring creature that commanded respect from all who saw it soar above. However, the embedded tokenizer code still needs modification as Falcon does not require adding bos token id and has some special tokens, which may depend on the implementation of GGUF format. |
Good Job 👍 I'll give this another look tomorrow and if everything works i'm gonna merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! Will leave it to Lukas to do final tests but OK from my end. Hopefully we can get the additional information from GGUF soon.
LGTM 👍 Now even works with the fp16 memory :D |
Does it works with Metal? |
@jempabroni Maybe, depends on if all necessary operations were already ported into metal shaders. You can try using it and if it gives you an invalid operation error it's not supported yet. |
This PR has already passed the tests of the Falcon mini series models, but due to limitations of my device, I haven't tested it with the original Falcon series models. #293